20 research outputs found

    GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data

    Full text link
    Recent research on pattern discovery has progressed from mining frequent patterns and sequences to mining structured patterns, such as trees and graphs. Graphs as general data structure can model complex relations among data with wide applications in web exploration and social networks. However, the process of mining large graph patterns is a challenge due to the existence of large number of subgraphs. In this paper, we aim to mine only frequent complete graph patterns. A graph g in a database is complete if every pair of distinct vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining algorithm developed to explore interesting pruning techniques to extract maximal complete graphs from large spatial dataset existing in Sloan Digital Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high efficiency especially in the presence of large number of patterns. In this paper, we describe GCG that can mine not only simple co-location spatial patterns but also complex ones. To the best of our knowledge, this is the first algorithm used to exploit the extraction of maximal complete graphs in the process of mining complex co-location patterns in large spatial dataset.Comment: 1

    NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

    Get PDF
    Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002

    Data Gathering with Tour Length-Constrained

    Get PDF
    In this paper, given a single mobile element and a time deadline, we investigate the problem of designing the mobile element tour to visit subset of nodes, such that the length of this tour is bounded by the time deadline and the communication cost between nodes outside and inside the tour is minimized. The nodes that the mobile element tour visits, works as cache points that store the data of the other nodes. Several algorithms in the literature have tackled this problem by separating two phases; the construction of the mobile element tour from the computation of the forwarding trees to the cache points. In this paper, we propose algorithmic solutions that alternate between these phases and iteratively improves the outcome of each phase based on the result of the other. We compare the resulting performance of our solutions with that of previous work

    Enumeration of Maximal Clique for Mining Spatial Co-location Patterns

    No full text
    Abstract. In this paper we present a systematic approach to mine co-location patterns in Sloan Digital Sky Survey (SDSS) Data. SDSS Data Release 5 contains 3.6 TB of data. Availability of such large amount of useful data is an obvious opportunity for application of data mining techniques to generate interesting information. The major reason for the lack of such data mining applications in SDSS is the unavailability of data in a suitable format. In this paper we present a procedure to get additional galaxy types from available attributes and transform the data into maximal cliques of galaxies which in turn can be used as transactions for data mining applications. Our main contribution is an efficient algorithm, GridClique, that generates maximal cliques from large spatial databases. It should be noted that the full general problem of extracting a maximal clique from a graph is known as NP-Hard. Our experiments show that the GridClique algorithm successfully generates all the maximal cliques in the SDSS data and enables the generation of useful co-location patterns. 1 Introduction an

    Dimensionality Reduction for Long Duration and Complex Spatio-Temporal Queries ABSTRACT

    No full text
    From tracking of moose in Sweden, to movement of traffic in a large metropolis, spatio-temporal data is continuously being collected and made available in the public domain. This provides an opportunity to mine and query spatio-temporal data with the purpose of finding substantial patterns and understand the underlying data generating process. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a certain pre-defined time. The standard approach to process a “flock query ” is to map spatio-temporal data into a high dimensional space and reduce the query into a sequence of standard range queries which can be presented using a spatial indexing structure. However, as it is well known, the performance of spatial indexing structures drastically deteriorates in high dimensional space. In this paper we propose a preprocessing strategy which consists of using a random projection to reduce the dimensionality of the transformed space. We prove an ′ ′ ɛ − δ ′ ′ probabilistic approximation which results from the projection and present experimental results which show, for the first time, the possibility of breaking the curse of dimensionality in a spatio-temporal setting. 1. INTRODUCTION AND RELATE

    Efficient Relational Techniques for Processing Graph Queries

    No full text
    Abstract Graphs are widely used for modeling complicated data such as social networks, chemical compounds, protein interactions and semantic web. To effectively understand and utilize any collection of graphs, a graph database that efficiently supports elementary querying mechanisms is crucially required. For example, Subgraph and Supergraph queries are important types of graph queries which have many applications in practice. A primary challenge in computing the answers of graph queries is that pair-wise comparisons of graphs are usually hard problems. Relational database management systems (RDBMSs) have repeatedly been shown to be able to efficiently host different types of data such as complex objects and XML data. RDBMSs derive much of their performance from sophisticated optimizer components which make use of physical properties that are specific to the relational model such as sortedness, proper join ordering and powerful indexing mechanisms. In this article, we study the problem of indexing and querying graph databases using the relational infrastructure. We present a purely relational framework for processing graph queries. This framework relies on building a layer of graph features knowledge which capture metadata and summary features of the underlying graph database. We describe different querying mechanisms which make use of the layer of graph features knowledge to achieve scalable performance for processing graph queries. Finally, we conduct an extensive set of experiments on real and synthetic datasets to demonstrate the efficiency and the scalability of our techniques
    corecore